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1.  Introduction 


As  the  amount  of  battlefield  technology  continues  to  increase,  Soldiers  are  faced 
with  a  daunting  task  of  trying  to  integrate  diverse  information  across  numerous 
devices.  The  growing  information  burden  across  devices  has  spawned  a  strong  in¬ 
terest  in  “smart  technology,”  where  algorithms  strive  to  become  a  digital  assistant 
to  streamline  information  processing  for  the  end  user.  Unfortunately,  users  typi¬ 
cally  have  as  many  examples  of  the  technology  making  pervasive  errors  as  ex¬ 
amples  where  the  technology  provides  helpful  assistance,  such  as  the  numerous 
auto-correct  fail  memes,  comical  questions  incorrectly  deduced  by  voice  recogni¬ 
tion  software,  or  lane-keeping  sensors  on  cars  that  alert  for  highway  medians  or 
chain  link  fences.  These  failures  often  originate  from  a  rigid  application  of  an  al¬ 
gorithm.  We  posit  that  the  technology  would  become  smarter  if  there  was  real-time 
feedback  from  the  user  that  could  modify  the  algorithm.  Thus,  research  at  the  US 
Army  Research  Laboratory  (ARL)  targets  the  development  of  adaptive  technology 
based  on  the  real-time  detection  of  human  state  (e.g.,  engagement  or  fatigue)  to 
improve  the  integrated  performance  of  humans  and  systems. 

While  the  concept  of  human  state  is  inherently  nebulous,  the  intuition  is  that  the 
configuration  of  our  physiology  underlying  our  behavior  is  predictive  of  upcoming 
performance.  Take  the  example  of  driving  while  fatigued.  When  we  are  tired,  we 
may  swerve  out  of  a  lane  unexpectedly.  If  sensors  could  use  real-time  physiology 
to  detect  a  human  state  that  is  predictive  of  decreased  vigilance,  smart  technology 
could  intelligently  assist  with  lane -keeping  technology.  In  contrast,  on  days  where 
the  driver  does  not  have  physiological  markers  of  decreased  vigilance,  the  smart 
technology  would  not  interfere  with  human  driving  as  the  swerving  in  and  out  of 
lanes  may  be  necessary  in  a  construction  zone.  This  is  just  one  example  about  how 
knowledge  of  the  human  state  could  enable  technology  to  adapt  more  intelligently 
to  the  user’s  real-time  needs. 

These  adaptive  technologies  depend  on  a  fundamental  scientific  achievement  to  re¬ 
liably  detect  physiological  states  that  are  indicative  of  performance.  Ongoing  work 
at  ARL  examines  physiological  signals  across  the  brain  and  body,  including  how 
well  brain  activity  can  predict  upcoming  task  performance.  In  this  research,  brain 
activity  is  often  measured  from  the  scalp  using  electroencephalogram  (EEG)  sen¬ 
sors  that  record  ongoing  electrical  activity  generated  when  different  brain  regions 
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are  communicating  information  in  support  of  behavioral  performance.  Fluctuations 
in  the  functional  activity  are  then  linked  to  variability  in  task  performance,  attempt¬ 
ing  to  capture  the  physiological  features  of  human  state  needed  to  predict  real-time 
human  task  needs.  Based  on  the  success  of  convolutional  neural  networks  (CNN)  in 
extracting  meaningful  representations  of  data  in  image  and  speech  processing,  we 
would  like  to  be  able  to  apply  these  methods  to  EEG.1  Currently,  the  need  for  lots 
of  training  data  poses  the  most  significant  obstacle  to  implementing  these  powerful 
algorithms  on  EEG. 

In  my  summer  internship,  I  learned  about  ongoing  work  at  ARL  to  overcome  this 
data  challenge  of  EEG  by  using  CNNs  with  minimal  parameters.  I  used  my  back¬ 
ground  in  mathematics  to  understand  the  algorithms  involved  in  implementing  EEG- 
Net,  an  ARL-developed  CNN  for  EEG  data,  and  apply  EEGNet  to  a  new  problem.2 
This  report  summarizes  that  work.  CNNs  are  powerful  function  approximators,  and 
in  the  first  section  of  the  technical  approach,  I  review  the  functions  that  make  up  the 
EEGNet  convolutional  neural  network.  To  understand  EEGNet,  I  had  to  learn  about 
maximum  likelihood  estimators  that  allow  us  to  pose  learning  problems  as  opti¬ 
mization  problems.  I  derive  the  EEGNet  objective  function  from  a  maximum  likeli¬ 
hood  estimate  in  the  second  section.  In  the  third  section  of  the  technical  approach,  I 
describe  stochastic  gradient  descent,  the  learning  algorithm  used  with  CNNs.  Then, 
I  describe  the  implementation  of  EEGNet  and  its  application  to  an  ARL  data  set 
that  examines  how  human  state  changes  following  naturalistic  sleep  loss.  The  pre¬ 
liminary  results  effectively  discriminate  between  human  states,  and  this  leads  to 
a  discussion  of  future  directions  in  which  I  propose  using  EEGNet  to  ask  further 
questions  about  physiological  state  changes  and  how  they  impact  task  performance. 

2.  Technical  Approach 

We  wanted  to  discriminate  between  human  states  from  individuals  performing  dif¬ 
ferent  cognitive  tasks  versus  observing  their  EEG  activity  alone.  For  this  particular 
project,  we  used  EEG  recordings  from  individuals  during  either  rest  or  while  per¬ 
forming  an  attentional  bias  task.  Our  goal  was  to  discriminate  between  the  2  tasks. 
We  used  machine  learning  to  discriminate  between  the  states.  We  modeled  the  cog¬ 
nitive  task  as  a  Bernoulli  trial  from  which  we  observed  the  EEG  recording.  This 
made  the  state  discrimination  problem  one  of  state  detection.  Then,  we  posed  our 
learning  problem  as  one  of  posterior  inference,  in  which  we  used  EEGNet2  for  the 


Approved  for  public  release;  distribution  is  unlimited. 


2 


inference  model.  Finally,  we  used  stochastic  gradient  descent  as  our  learning  algo¬ 
rithm  to  fit  EEGNet  to  the  observed  data. 

In  this  section,  we  will  briefly  review  CNNs,  introduce  EEGNet,  derive  the  posterior 
inference  problem,  and  review  stochastic  gradient  descent. 

2.1  Convolutional  Neural  Networks 

CNNs  are  powerful  machine  learning  algorithms  that  have  been  successfully  ap¬ 
plied  to  image  processing,  automatic  speech  recognition,  speech  translation,  and 
natural  language  processing.1  More  recently,  they  have  even  been  applied  to  the 
analysis  of  EEG  data.2  CNNs  are  a  subset  of  algorithms  known  as  deep  learning. 
Deep  learning  essentially  implies  the  composition  of  multiple  nonlinear  functions. 

f(x)  =  (fno---of0)(x).  (1) 

CNNs  comprise  particular  functions:  2-dimensional  convolutions,  pooling,  and  batch 
normalization. 

2.1.1  2-Dimensional  Convolution 

Convolutions  are  a  common  tool  in  signal  processing.  These  are  linear  transforma¬ 
tions  that  filter  a  signal  to  accentuate  or  attenuate  particular  features.  In  CNNs,  we 
use  the  2-dimensional  generalization  of  the  convolution  *  :  Mn  x  M'"  — >  Rnxm: 

n  m 

if*9)(x,y)  =  ^2^2f{i,j)g{x-i,y-j).  (2) 

i= 1  j=l 

Here,  /  is  the  signal  and  g  is  the  filter.  During  the  learning  process,  we  are  selecting 
the  filters  that  capture  the  most  important  features  of  the  signal.  In  images,  these 
features  are  often  edges  and  blobs.  A  visualization  of  2-dimensional  convolution  is 
shown  in  Fig.  1. 

2.1.2  Pooling 

Pooling,  or  downsampling,  reduces  the  dimensionality  of  the  signal  and  decreases 
the  sensitivity  of  the  algorithm  to  noise  in  the  signal.  Pooling  is  a  dimensionality 
reduction  technique: 

P:Rn^Rp.  (3) 
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Fig.  1  2-dimensional  discrete  convolution 

Figure  reprinted  from  https://mlnotebook.github.io/post/CNNl/.3 


Normally,  this  operation  is  either  the  average  or  maximum  of  adjacent  elements  of 
the  signal.  A  visualization  of  pooling  is  shown  in  Fig.  2. 


Fig.  2  2-dimensional  max  pooling  operator 

Figure  reprinted  from  https://mlnotebook.github.io/post/CNNl/.^ 


2.1.3  Batch  Normalization 

Batch  normalization  is  a  technique  for  improving  the  learning  process.  It  normal¬ 
izes  the  features  of  signals  in  subsequent  functions  within  the  CNN,  enforcing  a 
multivariate  Gaussian  distribution  of  features.  Figure  3  shows  the  distribution  of 
observations  before  and  after  batch  normalization. 

Let  x[k\  . . . ,  X(k>  be  a  subset  of  the  observations  where  k  corresponds  to  the  batch. 

More  generally,  these  could  be  the  input  to  any  function  fa  in  the  CNN.  We  would 
like  to  model  these  observations  to  be  nearly  independent  and  identically  distributed 
(IID)  Gaussian  samples  X{k]  ~  A f  (xm,  /i  =  0,  a2  =  1).  In  practice,  we  can  estimate 


Approved  for  public  release;  distribution  is  unlimited. 


4 


the  mean  and  variance  of  each  batch  with  the  sample  mean  x('k)  and  sample  variance 
(s2)(fc)  an(j  normalize  the  observations  accordingly  as  in  a  z-score. 

zf]  =  (s2)(fc)a:f)+x(fc).  (4) 

Learning  parameters  for  the  sample  mean  /3  ~  s2  and  sample  variance  7  ~  x  make 
this  procedure  more  computationally  efficient  and  the  learning  more  smooth.4 

250000 

200000 

150000 

100000 

50000 

0 

Before 

Fig.  3  Batch  normalization 


After 


2.2  EEGNet 


EEGNet  is  a  particular  instance  of  a  CNN  with  a  set  of  functions  optimized  for  EEG 
observations.  The  functions  that  compose  EEGNet  are  shown  in  Fig.  4.  It  is  divided 
into  4  layers,  in  which  most  have  a  2-dimensional  convolution,  pooling  operation, 
and  batch  normalization.  The  first  layer  can  be  interpreted  as  a  nonlinear  spatial 
filter.  The  second  layer  can  be  interpreted  as  a  nonlinear  temporal  filter.  Then,  the 
third  layer  aggregates  the  global  features  of  the  signal. 


Layer 

Input  (C  x  T) 

Operation 

Output 

Number  of  Parameters 

1 

CxT 

16  x  1  x  T 

16  x  1  x  T 

1  x  16  x  T 

16  x  ConvlD  (Cxi) 
BatchNorm 
Transpose 
Dropout  (.25) 

16  X  1  X  T 

16  X  1  X  T 

1  X  16  X  T 

1  x  16  x  T 

16C+16 

32 

2 

1  x  16  x  T 

4  x  16  x  T 

4  x  16  x  T 

4  x  8  x  T/4 

4  x  Conv2D  (2x32) 
BatchNorm 
Maxpool2D  (2,4) 
Dropout  (.25) 

4  x  16  x  T 

4  x  16  x  T 

4  x  8  x  T/4 

4  x  8  x  T/4 

4  x  2  x  32  +  4  =  260 

8 

3 

4  X  8  X  T/4 

4  x  8  x  T/4 

4  x  8  x  T/4 

4  x  4  x  T/16 

4  x  Conv2D  (8x4) 
BatchNorm 
Maxpool2D  (2,4) 
Dropout  (.25) 

4  x  8  x  T/4 

4  x  8  x  T/4 

4  x  4  x  T/16 

4  x  4  x  T/16 

4x4x8x4  +  4  =  516 
8 

4 

4  x  4  x  T/16 

Softmax  Regression 

J V 

TN  +  N 

Total 

16C  +  JV(T  +  l)  +  840 

Fig.  4  Layers  of  EEGNet  model 

Table  reproduced  from  Lawhem,  et  al.2 


Approved  for  public  release;  distribution  is  unlimited. 


5 


2.3  Learning  Objective 


Following  the  maximum  likelihood  formulation  of  Duda  et  al.  in  Pattern  Classifi¬ 
cation,5  we  modeled  the  cognitive  task  to  be  distributed  according  to  a  Bernoulli 
distribution  conditioned  on  the  associated  realization  of  EEG  recording.  Our  goal 
was  then  to  infer  the  cognitive  task  given  the  EEG  realization,  a  statistical  inference 
task.  We  used  EEGNet  to  estimate  the  Bernoulli  parameter  for  each  EEG  realization 
so  that  we  could  pose  the  inference  problem  as  a  parameter  estimation  problem. 
This  allowed  us  to  use  a  maximum  likelihood  estimate  to  formulate  the  learning 
problem  as  an  optimization  problem. 

Let  Xi, . . . ,  XN,  be  IID  observations  of  EEG  recordings  and  Yi,...,Yn  to  be  the 
associated  cognitive  task.  As  mentioned  above,  let  Yl\Xt  ~  Bern(p(X,)): 

fv\x{yi\xi)  =  p(xi)Vi(  1  -p(xi))  ^Vl .  (5) 


We  will  consider  the  log  likelihood  of  all  EEG  observations: 

N  N 

InUfM  =lnII  [p(xi)Vi  i1  -  P(xi)Y  y' 
2=1  2=1 

N 


y^ln  p(xi)yi(l  -p(xi)) 


1  ~Vi 


2=1 

N 


=  ^2yi  hi  p(xi)  +  (1  -  Vi)  In  (1  -  p(xi)). 
2=1 


(6) 

(7) 

(8) 


Now,  we  want  to  maximize  the  log  likelihood.  We  can  consider  the  joint  probability 
distribution  f(x,y )  for  the  random  variables  A",  Y  and  maximize  the  expected  value 

maxEv y~/[t/lnp(x)  +  (1  —  y)  In  (1  -  p(x))].  (9) 

v 

If  p  —  p(X:  uj)  is  a  function  of  both  the  EEG  observation  and  parameters  w  6  f \ 
then  we  have  the  more  commonly  known  binary  cross-entropy  loss: 

C{pj)  =  minE[— y  In p(x;  co)  —  (1  —  y)  In  (1  —  p{x\ cc))].  (10) 
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2.4  Stochastic  Gradient  Descent 


Since  we  were  able  to  pose  our  state  detection  problem  as  an  optimization  problem, 
we  selected  the  commonly  used  stochastic  gradient  descent  algorithm  for  our  learn¬ 
ing  algorithm.6  This  is  a  first-order  method,  which  scales  well  to  large  data  sets  and 
complex  models. 

The  gradient  descent  algorithm  is  a  powerful  unconstrained  optimization  method 
with  convergence  guarantees  for  strongly  convex  functions.  It  is  an  iterative  tech¬ 
nique  that  uses  only  knowledge  of  the  gradient  of  the  function.  Let  /  be  a  function 
of  u>  and  0  <  77  <  1  be  the  learning  rate.  The  algorithm  is  as  follows: 


UJk+l  <r-  UJk  -  77fcVw/(u;). 


(11) 


For  objectives  that  have  the  form  C(u)  =  Exf(X:  uj),  gradient  descent  can  be 
computationally  expensive.  An  alternative  approach  is  stochastic  gradient  descent, 
which  uses  Monte  Carlo  sampling  to  estimate  the  gradient: 

N 

E  (12) 

i=  1 

where  Xi  ~  fx.  Therefore,  we  can  estimate  the  gradient  of  our  maximum  likelihood 
objective  as  follows: 

V uMx,Y~f  Iv  In  p(x', w)  +  (1  ~y)  In  (1  ~p(x]uj))\ 

1  m  Q 

~  Vw— VVlnp^  —  -u)  +  (1  —  j/i)  In  (1  -pfau)),  (13) 

m  ou 

2—1 

which  yields  the  following  iterative  algorithm: 

^  m 

Ut  <r-  ut-i  +  rjtWuj— Vi  In p(xi]Uj)  +  (1  -  yi)  ln(l  -  p{xi,  u)).  (14) 

m  ^ — ' 

i=  1 
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3.  Implementation  and  Analysis  Using  EEGNet 


Using  Python,  we  implemented  EEGNet  in  Keras7  with  Tensorflow8  back-end  ac¬ 
cording  to  Fig.  4.  We  used  the  default  parameters  for  the  Adam9  optimizer  (a  variant 
of  stochastic  gradient  descent).  We  trained  the  algorithm  for  10  epochs. 

Lawhern  et  al.2  used  EEGNet  on  data  from  standard  brain-computer  interface  (BCI) 
paradigms  (i.e.,  rapid  serial  visual  presentation,  motor  imagery,  and  visually  evoked 
potentials).  In  this  project,  we  have  examined  whether  EEGNet  can  be  extended  to 
detect  more  nebulous  cognitive  states  that  capture  the  influence  of  naturalistic  sleep 
loss  on  task  performance. 

We  tested  the  approach  on  data  from  a  previously  collected  ARL  study  called  Cog¬ 
nitive  Resilience  and  Sleep  History  (CRASH).  Participants  (N=29)  provided  sleep 
history  over  an  18-week  time  period,  including  objective  measurements  of  sleep  du¬ 
ration  and  quality  from  actigraph  wrist  watches  and  subjective  measurements  from 
daily  web-based  questionnaires.  They  came  into  the  laboratory  every  2  weeks  for 
a  4-h  experimental  session  where  brain  data  was  collected  while  they  performed 
5  cognitive  tasks  and  a  resting  state  scan.  In  this  novel  data  set,  we  have  8  brain- 
behavior  sessions  to  assess  the  impact  of  naturalistic  sleep  loss  on  task  performance 
over  an  18-week  timeframe. 

To  test  the  extension  of  EEGNet  to  state  detection,  we  compared  EEG  data  be¬ 
tween  the  resting  state,  where  the  participant  was  able  to  mind  wander  as  desired, 
and  the  attentional  bias  task,  where  the  participant  had  to  discriminate  letters  that 
followed  emotional  faces.  We  selected  one  subject  from  the  CRASH  data  set  and 
included  observations  from  all  8  recording  sessions.  We  separated  the  data  into  non¬ 
overlapping  epochs  of  500  ms.  The  data  were  previously  processed  using  ARL’s 
standard  pipeline.10 

We  randomly  partitioned  our  data  into  training  (80%)  and  testing  (20%)  10  times 
and  reported  the  performance  in  Area  Under  the  Receiver’s  Operating  Characteris¬ 
tic  Curve  (AUC).  The  average  AUC  observed  over  the  10  iterations  was  90%  (see 
Fig.  5  for  the  results).  This  performance  is  significantly  above  change  and  indi¬ 
cates  that  EEGNet  shows  promise  for  our  extension  of  the  method  to  discriminate 
cognitive  states  beyond  those  of  standard  BCI  paradigms. 
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Receiver  Operating  Characteristic  Curve:  Cognitive  Tasks 


Fig.  5  Receiver’s  operating  characteristic  curve  for  one  random  partition  of  the  data 


4.  Conclusion  and  Future  Directions 

In  my  summer  internship  project,  I  learned  how  recent  work  at  ARL  has  made  it 
possible  to  apply  CNNs  to  EEG  data.  Following  the  work  of  Lawhern  et  al.,2  I 
implemented  EEGNet,  a  CNN  with  relatively  few  parameters,  and  used  it  to  classify 
human  states  from  EEG  recordings  in  a  single  subject  from  the  CRASH  data  set. 
To  complete  this  project,  I  learned  how  CNNs  compose  convolutions  and  pooling 
functions  to  build  representations  of  data,  how  to  use  maximum  likelihood  estima¬ 
tors  to  pose  learning  problems  as  optimization  problems,  and  how  to  use  stochastic 
gradient  descent  to  solve  optimization  problems.  Our  preliminary  results  indicate 
that  EEGNet  successfully  discriminates  between  cognitive  tasks. 

In  future  work,  I  would  like  to  use  EEGNet  to  detect  the  presence  of  more  compli¬ 
cated  states.  As  described  in  the  introduction,  state  detection  will  facilitate  adaptive 
technologies  that  can  respond  to  changes  in  the  user’s  state.  In  this  preliminary 
work,  we  only  discriminated  the  performance  of  one  cognitive  task  from  a  rest 
state.  However,  the  CRASH  data  set  has  recordings  from  several  sessions  for  each 
subject.  I  would  like  to  use  EEGNet  to  discriminate  between  sessions  for  the  same 
subject.  Because  the  sessions  differ  by  sleep  history,  the  ability  to  discriminate  by 
session  would  indicate  that  EEGNet  could  discriminate  sleep  history  of  a  user.  This 
could  be  used  in  future  adaptive  technologies  to  detect  user  fatigue  and  likely  poor 
performance. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


ARL 

US  Army  Research  Laboratory 

AUC 

area  under  the  receiver’s  operating  characteristic  curve 

BCI 

brain-computer  interface 

CNN 

convolutional  neural  network 

CRASH 

cognitive  resilience  and  sleep  history 

EEG 

electroencephalogram 

EEGNet 

ARL-developed  CNN  for  EEG  data 

IID 

independent  and  identically  distributed 

Approved  for  public  release;  distribution  is  unlimited 


11 


Intentionally  left  blank. 


Approved  for  public  release;  distribution  is  unlimited. 


12 


1 

DEFENSE  TECHNICAL 

1  OSD  OUSD  ATL 

(PDF) 

INFORMATION  CTR 

(PDF)  HPT&B  B  PETRO 

DTIC  OCA 

4800  MARK  CENTER  DRIVE 
SUITE  17E08 

2 

DIR  ARL 

ALEXANDRIA  VA  22350 

(PDF) 

IMAL  HRA 

RECORDS  MGMT 

RDRL  DCL 

TECH  LIB 

ABERDEEN  PROVING  GROUND 

1 

GOVT  PRINTG  OFC 

13  DIR  ARL 

(PDF) 

A  MALHOTRA 

(PDF)  RDRL  HR 

J  LOCKETT 

1 

ARL 

P  FRANASZCZUK 

(PDF) 

RDRL  HRB  B 

K  MCDOWELL 

T  DAVIS 

KOIE 

BLDG  5400  RM  C242 

RDRL  HRB 

REDSTONE  ARSENAL  AL 

D  HEADLEY 

35898-7290 

RDRL  HRB  C 

J  GRYNO VICKI 

8 

ARL 

RDRL  HRB  D 

(PDF) 

SFC  PAUL  RAY  SMITH  CENTER 

C  PAULILLO 

RDRL  HRO  COL  H  BUHL 

RDRL  HRF  A 

RDRL  HRF  J  CHEN 

A  DECOSTANZA 

RDRL  HRA  I  MARTINEZ 

RDRL  HRF  B 

RDRL  HRR  R  SOTTILARE 

A  EVANS 

RDRL  HRA  C  A  RODRIGUEZ 

A  BOHANNON 

RDRL  HRA  B  G  GOODWIN 

I  VETTEL 

RDRL  HRA  A  C  METEVIER 

RDRL  HRF  C 

RDRL  HRA  D  B  PETTIT 

I  GASTON 

12423  RESEARCH  PARKWAY 

RDRL  HRF  D 

ORLANDO  FL  32826 

A  MARATHE 

1 

USA  ARMY  G1 

(PDF) 

DAPE  HSI  B  KNAPP 

300  ARMY  PENTAGON 

RM  2C489 

WASHINGTON  DC  20310-0300 

1 

USAF711  HPW 

(PDF) 

711  HPW/RH  KGEISS 

2698  G  ST  BLDG  190 

WRIGHT  PATTERSON  AFB  OH 

45433-7604 

1 

USN  ONR 

(PDF) 

ONR  CODE  34 1  J  T ANGNE Y 

875  N  RANDOLPH  STREET 

BLDG  87 

ARLINGTON  VA  22203-1986 

1 

USA  NSRDEC 

(PDF) 

RDNS  D  D  TAMILIO 

10  GENERAL  GREENE  AVE 

NATICK  M A  01760-2642 

Intentionally  left  blank. 


Approved  for  public  release;  distribution  is  unlimited. 


14 


